562 research outputs found
Robust learning with implicit residual networks
In this effort, we propose a new deep architecture utilizing residual blocks
inspired by implicit discretization schemes. As opposed to the standard
feed-forward networks, the outputs of the proposed implicit residual blocks are
defined as the fixed points of the appropriately chosen nonlinear
transformations. We show that this choice leads to the improved stability of
both forward and backward propagations, has a favorable impact on the
generalization power and allows to control the robustness of the network with
only a few hyperparameters. In addition, the proposed reformulation of ResNet
does not introduce new parameters and can potentially lead to a reduction in
the number of required layers due to improved forward stability. Finally, we
derive the memory-efficient training algorithm, propose a stochastic
regularization technique and provide numerical results in support of our
findings
A Dynamically Adaptive Sparse Grid Method for Quasi-Optimal Interpolation of Multidimensional Analytic Functions
In this work we develop a dynamically adaptive sparse grids (SG) method for
quasi-optimal interpolation of multidimensional analytic functions defined over
a product of one dimensional bounded domains. The goal of such approach is to
construct an interpolant in space that corresponds to the "best -terms"
based on sharp a priori estimate of polynomial coefficients. In the past, SG
methods have been successful in achieving this, with a traditional construction
that relies on the solution to a Knapsack problem: only the most profitable
hierarchical surpluses are added to the SG. However, this approach requires
additional sharp estimates related to the size of the analytic region and the
norm of the interpolation operator, i.e., the Lebesgue constant. Instead, we
present an iterative SG procedure that adaptively refines an estimate of the
region and accounts for the effects of the Lebesgue constant. Our approach does
not require any a priori knowledge of the analyticity or operator norm, is
easily generalized to both affine and non-affine analytic functions, and can be
applied to sparse grids build from one dimensional rules with arbitrary growth
of the number of nodes. In several numerical examples, we utilize our
dynamically adaptive SG to interpolate quantities of interest related to the
solutions of parametrized elliptic and hyperbolic PDEs, and compare the
performance of our quasi-optimal interpolant to several alternative SG schemes
Greedy Shallow Networks: An Approach for Constructing and Training Neural Networks
We present a greedy-based approach to construct an efficient single hidden
layer neural network with the ReLU activation that approximates a target
function. In our approach we obtain a shallow network by utilizing a greedy
algorithm with the prescribed dictionary provided by the available training
data and a set of possible inner weights. To facilitate the greedy selection
process we employ an integral representation of the network, based on the
ridgelet transform, that significantly reduces the cardinality of the
dictionary and hence promotes feasibility of the greedy selection. Our approach
allows for the construction of efficient architectures which can be treated
either as improved initializations to be used in place of random-based
alternatives, or as fully-trained networks in certain cases, thus potentially
nullifying the need for backpropagation training. Numerical experiments
demonstrate the tenability of the proposed concept and its advantages compared
to the conventional techniques for selecting architectures and initializations
for neural networks
- …